System Hazard Analysis of a Complex Socio-Technical System: The Functional Resonance Analysis Method in Hazard Identification

نویسنده

  • Brendon Frost
چکیده

The value of characterising systems in high-risk industries as complex and socio-technical systems is increasing. Using complex and socio-technical system view-points, high profile industrial and transport accidents can be investigated by focusing attention to interactions between skilled operators, technology, and automation in geographically dispersed operations, as well as unintended design risk factors to arise from complex and non-linear interactions can be highlighted. Currently there are methods for identifying and assuring the safety of interactions between and within systems but the modeling is incomplete. However, one potentially valid method is the Functional Resonance Analysis Method (FRAM). FRAM is a qualitative approach generating a functional (rather than structural) model of the relationships between sub/systems, and has the potential to produce inputs suitable for safety assurance and risk analysis methods. This paper presents a methodology for incorporating a modified FRAM technique within a System Hazard Analysis (SHA). The application of the FRAM/SHA methodology in this case study is to explore the validity of this approach for assessing hazards arising from structural and process changes to the operational control center of an international airline, as a result of the introduction of a new software system into an existing suite of COTS software tools. This paper will explore the methodology in terms of requirements that include the creation of a new work team, the functional division of an existing operator role, and changes to system performance and safety critical processes. Keywords: Socio-Technical system, Functional Resonance Analysis Method, FRAM, System Hazard Analysis, operational control center, risk analysis, COTS requirements. 1 Introduction A significant challenge in designing and operating complex systems is the potential for unpredictable system behavior to ‘emerge’ from the complex (and often transient) interconnections that can arise under dynamic operating conditions. Limited understanding of these factors can manifest as an undefined gap between the system as intended and the system as implemented/operated (Leveson, 2011a). This arises partly as a result of the limited guidance available to support the design of effective interactions between system elements (such as human operators and technology), particularly where they are separated by time and geography, and where system function is constrained by time, conditions of information uncertainty, and decentralised control mechanisms (Vicente, 1999). In addition, the development of tools for analysing and modeling complex and non-linear system behavior (particularly for degraded performance modes), and the assessment of organisational, social, and human factor impacts, are in their relative infancy when compared to the traditional reliability driven approaches used across the system engineering lifecycle (Allenby & Kelly, 2001). This paper describes a novel systems modeling technique, the Functional Resonance Analysis Method (FRAM), modified for use within the context of a System Hazard Analysis (SHA). FRAM has been used within the Cognitive Systems Engineering domain to model and assess the complex functional interactions between elements within socio-technical systems, but primarily for accident investigation purposes (Hollnagel, 2012). However, the technique potentially has greater utility than traditional hazard identification techniques (e.g. HAZOPs variants, FMEA/FMECA approaches, etc.) in supporting risk analysis through identifying system level hazards and defining the specific (and often transient) scenarios/conditions under which they arise. The system case study highlighted in this paper is the Operations Control Centre (OCC) of a large international airline, where the modified FRAM/SHA method was applied in order to define system functional and safety requirements prior to the introduction of new industry mandated software interfacing airline operations and the Australian air traffic management system. The utility of the FRAM technique will be outlined for identifying scenario/condition-specific hazards, specifically to inform design requirements at the different levels of system performance and function, safety critical processes, and operator function and role accountabilities. 2 Complex System SHA The identification of potential hazards in complex systems is fundamental to establishing safety and reliability across the design, production, and operational stages of the System Engineering (SE) lifecycle (Bahr, 1997). A complete and comprehensive picture of the hazards present or likely to exist in a system, and determination of the escalation of hazards to become mishap or accident scenarios, supports the early establishment of performance and safety requirements for design and operational objectives. This supports a more complete and comprehensive analysis to reduce the uncertainty of any subsequent risk quantification, while providing higher levels of safety assurance (Seligmann et al, 2012). The SHA is a SE hazard analysis process applied during the design phases to assess the integration of designs. The SHA aims to identify hazards arising from the functional interfaces between subsystems, as well as the presence of latent design hazards with the potential to escalate into interrelated fault events (Ericson, 2005). Roland and Moriarty (1990) list the primary objectives of the SHA process as including consideration of: • Compliance with specified safety criteria; • Hazardous events, including failure or degradation of safety devices, controls, and safety constraints and functions; • Degradation of the system’s safety levels under normal or abnormal conditions; • The impact of design or engineering changes, and; • Human System Interfaces (HSI) including human performance errors and control functions. The SHA process may draw from a broad range of established hazard identification techniques, but will typically commence with qualitative techniques to establish the causality of credible mishap/accident scenarios before proceeding to quantification techniques (NASA, 2011). The foundation of hazard identification and analysis (HAZID), therefore, is identifying credible mishap and accident event scenarios. However, HAZID within safety critical systems is often challenging due to the stochastic effects of interactive and dynamic system complexity, and the presence of system intractability (or under-specification) as commonly found in complex and socio-technical systems (Hollnagel, 2012). Established risk analysis practice adopts the scenario driven approach to the systematic review of safetycritical systems, in order to identify potential mishap and accident scenarios (CCPS, 2008; Mannan, 2005). The development of credible scenarios enables system complexity to be reduced to discrete ‘snapshots’ of the system under a range of conditions and in different system states, where they can be modeled as deterministic functional relationships between system elements (Bossel, 2007). A scenario consists of an expected situation/characteristic sequence or combination of events, and describes a generic situation that encompasses and relates a set of reasonably probable events/situations. Khan (2001) cautions against focusing on identification of the ‘worst-case’ scenarios through risk assessment activities, as this may unnecessarily restrict the scope and coverage of the scenario set identified: he suggests that it is preferable to focus efforts on identifying “...credible accident[s] ... within the realm of possibility and likely to be severe enough to cause significant damage”. An effective risk analysis process for a complex system must therefore combine appropriate hazard identification and analysis techniques in order to generate as complete and comprehensive a set of mishap and accident scenarios as possible for subsequent quantification (Cameron & Raman, 2005). Siu (1994) particularly identifies dynamic system dependencies (e.g. common-cause initiators, functional coupling, and shared equipment/components) as requiring ‘complexity decomposition’ before the modeling approaches commonly used in risk analysis can provide valid scenario quantification. In addition, Leveson (2011a) suggests that the selection of HAZID techniques should consider the system’s complexity and socio-technical characteristics, so that the HAZID process is able to describe system scenarios resulting from dependent incredible events, and/or transient system states. 2.1 Complexity System Properties Complex systems are typically dynamic, with changes in system complexity leading to changes in the system’s needs and objectives, which in turn ‘orients’ system behavior towards these new objectives. As a result, complex systems can exhibit changes in characteristic behaviors and states that are not easily observed or predicted, except through changes in state variables (Bossel, 2007). Perrow (1984) describes complexity simply as “...those of unfamiliar sequences. Or unplanned and unexpected sequences, and either not visible or not immediately comprehensible.” Expanding upon this definition through use of Systems Theory, complexity can take a number of forms (Leveson, 2011a): • Interactive complexity between components/elements within the system, or between sub/systems. • Dynamic complexity, or system changes in relation to time. • Decompositional complexity, where the system’s structure and function are not obviously consistent/linked. • Non-linear complexity, where cause and effect are intractable or not easily described or specified. Hollnagel (2012) employs a Cognitive Systems Engineering perspective to relate dynamic complexity and non-linear complexity as a reflection of the extent to which system function is discernable or tractable: “Dynamic complexity refers to situations where cause and effect are subtle, and where the effects over time of interventions are not obvious”. This perspective is complemented by Perrow’s (1984) original concept of coupling within systems, where unintended or increased functional interactions and dependencies within a system can cause a sub-system/element event to cascade and resonate through the system, leading to ‘incredible’ event scenarios. Emergence is a system characteristic that describes the effects of decompositional and non-linear complexity within a complex system, and refers to cases where system behavior or state changes cannot be explained in terms of a direct cause and effect relationship to discrete underlying processes or events (Hollnagel, 2012). For example, dependence between two or more low-probability or ‘incredible’ events may occur as a result of the influence of common systemic factors, rather than via a temporal or direct cause-andeffect relationship (Leveson, 2011b; Hollnagel, 2004). Concepts of direct causality may therefore be inadequate for predicting and describing mishap or accident event scenarios within a HAZID process, particularly where complexity interactions may occur that are not easily understood or identified. Finally, Dekker, Cilliers and Hofmeyr (2011) summarise complexity as a property of distributed systems: “Complex systems are held together by local relationships only. Each component is ignorant of the behavior of the system as a whole, and cannot know the full influences of its actions. Components respond locally to information presented to them, and complexity arises from the huge, multiplied webs of relationships and interactions that result from these local actions. The boundaries of what constitutes the system become fuzzy; interdependencies and interactions multiply and mushroom. ” 2.2 Socio-technical System Properties System performance variability is a common feature of large-scale socio-technical systems, where demands arising from interaction with the external environment, social, organisational, and individual operator system factors within the system, must be met through tradeoffs against the purpose and objectives of the system and within finite time and resource constraints. This leads to a situation where a complete description of the system of work (i.e. how work is to be accomplished) is intractable, or cannot be fully specified due to the effect of elaboration (the presence of significant detail), the rate of change (dynamic complexity), incompleteness of functional knowledge, and/or process heterogeneity and irregularity (Hollnagel, 2012). Performance variability therefore can be seen as a response to the presence of dynamic complexity in the system over time. Socio-technical systems can be defined as having a human-intensive and organisation focused architecture, and can be defined as increasingly common classes of large-scale system that feature a combination of technological systems (where hardware and software technology feature as significant elements within the system), human interfaces, and human-intensive organisational systems (Jackson, 2010). Common to these complex and large-scale socio-technical systems are characteristic behaviors that include (Bossel, 2007): • Self-organisation: the system can change structure, parameters, rules, etc., to adapt to environmental demands independently of centralized control. • Co-existence: the system modifies its behavior in order to respond to interactions with others systems that it cannot operate in isolation from. • Self-replication: the self-organising system can be capable of generating similar systems, particularly in the case of industrial organisations. Groth, Wang and Mosley (2010) note the difficulty of quantitatively determining the causal role of nondeterministic/uncertain factors arising from human or organisational system elements in system failure or dysfunction, particularly the probabilistic modeling of “soft” relationships. They further note the challenges of modeling uncertain relationships between system elements, particularly where sequence, direct causality, and event independence cannot be assumed. The FRAM technique was selected for application in this case study because it was specifically designed to assess systems with these features, and due to its demonstrated capability for identifying the impact of both complexity and socio-technical system properties in retrospective accident analyses. 2.3 Implications for the Risk Analysis of Complex Systems Consideration of system socio-technical and complexity factors informs the decomposition of system structure and function as a basis for identifying how perturbations of, and interactions within, the system under study can propagate in undesirable ways and lead to ‘system mishaps/accidents’ (Dekker, Cilliers & Hofmeyer, 2011; Bahr, 1997). These properties have significant implications for HAZID activities, as hazards may emerge infrequently under rare combinations of circumstances or unique system states, be difficult to predict, and mean that the use of HAZID techniques that assume linear causality and independence between low-probability events will likely lead to significant underestimates of system risk (Jackson, 2010; Leveson, 1995). Hollnagel (2012) points to the limitations of existing HAZID techniques that presume as a starting point a complete and unambiguous description or specification of the structure and function/s of the system, and that are based on assumptions that are invalid for complex systems: • That a system can be meaningfully decomposed into constituent elements – i.e. that the whole is the sum of the parts. • That these elements operate according to binary modes – i.e. elements either function or fail. • That the system functions as intended, and that system events follow pre-determined sequences in a consistent, orderly, and linear manner. The presence of complexity in a large-scale system therefore means that a complete understanding (and particularly a precise structural or process description) of a complex system may well be unobtainable, and that there is always likely to be a degree of uncertainty as to the way the system will function under all possible conditions, respond to unforeseeable demands or changes in system objectives/needs over time, or how it may change to suit the external environment (Hollnagel, 2012; Modarres & Cheon, 1999). Similarly, socio-technical system features (such as intractability and performance variability characteristics) mean that a structural decomposition of a system is unlikely to provide adequate insight into a system and may lead to the identification of an incomplete set of event scenarios descriptors for HAZID and subsequent risk analysis (Rasmussen & Petersen, 1999). A number of approaches have been suggested for conducting HAZID processes while taking complexity and socio-technical factors into account, however, to date none have achieved widespread adoption outside of specific communities of practice. Most address these challenges through modification of existing techniques/approaches, but two techniques have been specifically developed to address this need: • System Theoretic Process Analysis (STPA: Leveson, 2011a, 2004); and, • Functional Resonance Analysis Method (FRAM: Hollnagel 2012, 2004). FRAM was chosen as the basis for this research partly because of the body of literature available, but primarily because of it’s potential for modeling graduated/degraded functional variability (Herrera & Woltjer, 2010). FRAM is most likely to be of value from the detailed design phase of the Systems Engineering life cycle, and following the availability of a detailed Concept of Operations (CONOPS). In this context FRAM could replace or complement established techniques such as HAZOPs, Functional Failure/Hazard Analysis, etc., and could identify system hazards missed during the Preliminary Hazard Identification (PHI) process. 3 FRAM in Context The Functional Resonance Analysis Method (FRAM) is a qualitative analysis technique that supports modeling of complexity and socio-technical factors, including the interfaces between adaptable human agents and technology, coupling and dependence effects, nonlinear dependencies between sub-systems, and functional performance variability (Woltjer & Hollnagel, 2008b). FRAM has previously been used to conduct qualitative system analyses: initially as a system accident investigation technique (see: De Carvalho, 2011; Hollnagel et al, 2008; Nouvel et al, 2007; Sawargi et al, 2006), and more recently as a selfcontained qualitative risk assessment method to inform design activities for large distributed systems (see: Belmonte et al, 2011; Herrera & Woltjer, 2010; Macchi et al, 2008; Woltjer & Hollnagel, 2008a; Woltjer & Hollnagel, 2008b). To date, FRAM analyses have been undertaken for air traffic management, rail transport, financial market, and nuclear waste transport systems. FRAM theory contains different definitions of terms to those commonly used in HAZID activities: • Functions are defined as representing the set of activities (the actual or likely work done, rather than an idealized work-as-imagined) required to produce an outcome or achieve sub/system objectives. More formally, the concept of a function is associated with activity intended to produce something of relevance to the system’s objectives or change system state; the function’s output describes a system condition or state (Hollnagel, 2012). • Mishap/accident scenarios are seen as a product of uncontrolled hazards that emerge from performance variability and led to unintended or increased functional interactions and dependencies within a system, causing a sub-system/element event to cascade and resonate through the system (Hollnagel, 2004). • Performance variability is considered to arise from the intractability of work management within complex systems, as agents independently tradeoff efficiency against thoroughness (known as the Efficiency-Thoroughness-Trade-Off: ETTO) in achieving the purpose and objectives of the system (Hollnagel, 2009). • Failure is defined differently to existing HAZID techniques as “the temporary or permanent loss of a system’s ability to anticipate risks and make proactive approximate adjustments to understand and adjust to the current conditions (resources, demands, conflicts, interruptions, underspecified work requirements)” (Hollnagel, 2013b). FRAM supports a systemic decomposition methodology, with analysis of a unique FRAM model (of a specific system) describing the functionality needed to meet the system’s objectives and the range of functional variation that supports the achievement of these objectives (Hollnagel, 2013a). Through characterizing the variability of these functions, specific instantiations (or snapshots) of the system under defined situations and conditions can be determined to identify how interactions and relationships within the system (and with other systems) reconfigure, leading to undesirable functional variability and resonance within the system (Hollnagel, 2012). This is where FRAM differs in concept from established HAZID techniques, in that it focuses on determining the likelihood of functional variability rather than the probability of malfunction or failure (in the traditional use of the term as the loss of function/binary failure modes). However, in applications the FRAM approach remains compatible with established HAZID techniques used in many industries, in that the technique guides the decomposition and/or analysis of the system under study in order to identify plausible scenarios that form the basis for subsequent risk assessment activities. 3.1 The FRAM Methodology In order to allow activities that build and validate a functional model of the system under study, FRAM has been adapted into a HAZID technique in this case study. The functional characteristics of the adopted FRAM model are “perturbed” to identify the hazards arising from coupling and system interactions within defined scenarios. The FRAM process, as outlined by Hollnagel (2012) and as applied in this research, includes three key process steps: • Step 1: Functional Identification and Description: The first step in building a FRAM model is the identification and description of the actual (or likely) functions (rather the idealized functions), or activities that represent the likely “work-as-done”. This includes characterizing the functions and identifying the possible linkages (couplings) between functions via each of the defined characteristics. • Step 2: Performance Variability Description: Once a model has been defined, the variability of functions are determined, effectively creating the ‘instantiations’ or scenario precursors that can be used in HAZID workshops. Of direct interest is the variability of a function’s output, as this is the aspect that can affect coupling and interactive complexity and is therefore the representation of performance variability. • Step 3: Aggregation of Performance Variability: The remaining step used in the FRAM analysis process is to determine how performance variability can combine and drive non-linear system effects and outcomes through “upstream-downstream coupling” of functions. (Note: The FRAM approach established in the literature has an additional step that generates the productive output for a stand-alone process, but which has been omitted in this research as redundant due to the integration of FRAM into the SHA process.) The process for using FRAM in a prospective hazard analysis therefore involves building a system model from the constituent functions, with six characteristics used to define each function: The input/trigger that starts the function, relevant time constraints, limiting control/s, resources required, preconditions needing to be met, and the output’s quality determinants. The functions can then be linked to build a functional baseline model of the system that identifies coupling and interactions under defined (ideal) conditions. This is followed by the development and analysis of a number of scenarios or instantiations of the model that identify how functions can be coupled under a range of favorable or unfavorable conditions. Details of the standard approach to applying FRAM can be found in Hollnagel (2012). While the majority of the FRAM process involves the development of the model in a table format, FRAM models and instantiations can also be represented in a graphical form using a modular ‘hexagon’ representation of each function and its six defining characteristics. An example of a FRAM module graphic is shown in figure 1. The FRAM/SHA method used in this research differs from that published (Hollnagel, 2012; Hollnagel, 2004) as follows: • Once the model is constructed, a small, representative group of system experts are consulted to both refine/confirm/agree that the model was representative of the intended functional system, and to provide advice on the nature of credible functional ‘perturbations’ likely in the system. • The baseline FRAM model is then reviewed by an expert group of system experts in a facilitated HAZID workshop following a guideword process, in order to identify hazards, their escalation paths to incidents/accidents (represented by sets of discrete scenarios that formed the primary products/outputs from the HAZID process), and the barriers in place to prevent this. • The validated model and credible system perturbations are then used to develop ‘instantiations’ of the model, representing the revised couplings between functions and any impacts on function performance/efficacy under defined conditions. • In this case study, the HAZID workshop outputs and three FRAM instantiations were then used as the basis for developing risk models. The adoption of the guideword concept from (amongst others) the popular HAZOPs technique was used to improve the efficiency of the HAZID workshop process through consistent prompting and guidance of system experts. The function aspects defined in the FRAM (i.e. input, time, preconditions, resources, controls, output) can be considered equivalent to the ‘keywords’ concept from HAZOPs, which act to focus attention on the parameters of interest. Specific guidewords were developed to facilitate the FRAM process and ensure consideration of two different aspects of function variability: • Factors affecting the variability of the function itself due to the operation of the function (i.e. internal influences on variability), through determining different aspects of the output variability, and; • Factors affecting inputs to the function that affect the Input, Time, Control, Resource, or Precondition characteristics of the function (i.e. external influences on variability). Consequently, the modified FRAM method uses two separate sets of guidewords, used separately in two sequential ‘passes’ across the baseline FRAM model by the system experts to identify hazards arising from interactivity, defined conditions, and the effect of potential variability on function coupling and performance: Pass 1: Determination of function variation where the input, time, control, precondition, resource is Early, Delayed, Absent, Wrong Rate, Under specified (insufficiently constrained), Over-specified (too constrained). Pass 2: Determination of variation where the function is impacted by the specific conditions or socio-technical and individual factors of: • Time pressures/constraints/duration • Information certainty/sufficiency • Quality of human-technology interfaces Disruption management plan initiated OF5 Coordinate disruption plan execution I P R O T C Priority activity Disruption management plan in effect Plan legal & authorised Training & experience

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Integrated Hazard Identification Method for Socio-technical Systems based on STPA

The traditional hazard analysis approaches applied to the socio-technical system can not cover the complex organization structures, the interactions between systems and human behaviors, the interrelated factors among sub-systems and the safety culture of specific societies. This paper presents an integrated hazard identification methodology named BFM-STPA(STPA hazard identification Based on For...

متن کامل

Analysis of Hazard Identification Methods in Process Industries Using Analytic Network Process Technique (ANP)

Background and aims: Hazard identification is a critical factor to ensure safe design and operation of systems in the process industries. Process industries are one of the most complex systems, with a variety of equipment, control systems, and executive procedures. In these industries, the use of hazardous materials as raw materials or products is quite common. Interactions between technical co...

متن کامل

ارزیابی پویشگر ریسک به منظور شناسایی ریسک‌های در حال ظهور با استفاده از مدل آنالیز تشدید کارکرد: مطالعه‌ی موردی در یک واحد فرایندی

  Background and aim: Today, it was revealed that Socio-technical systems did not have a bimodal nature and interactions in these systems are complex and non-linear. Consequently, since risks can be emerged as non-linear combinations of performance variability, so traditional methods of risk assessment are not able to capture these combinations. The present paper is aimed at identifying the eme...

متن کامل

Developing a Method for Assessing and Managing the Risk of Covid-19; Rapid Covid-19 Hazard analysis

Background and aims: Work environments are constantly changing under the influence of various factors and newer risks are introduced. Rapid changes in science and technology, increasing the complexity of the industry, increased system integration and other factors have been shown to increase total risk in the past few decades. As well, risk management becomes increasingly critical in decreasing...

متن کامل

Identification and assessment of hazard in the Refractory Brick Production Company of Gonabad, Iran, using the hazard and operability technique

  Background: In the refractory brick manufacturing industry, because of the high risks associated with the level of dust in the factory environment and thermal stress, a precise identification of industrial hazards is required as a part of safety analysis. The aim of this study was to introduce a preventive approach to risk identification and assessment in the refractory brick production ...

متن کامل

ارزیابی اثر بخشی روش های تشخیص برای شناسایی خطرهای موجود در صنعت

Background and Aim: The first step in establishing a safety system is hazard identification.  If this is not done properly, the subsequent steps steps will not be done effectively either. Since any given identification technique often targets the hazards of one or two of the main elements of a safety system, it is not possible to identify all hazards by a single technique Materials and Methods...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014